Re-identification is generally carried out by encoding the appearance of asubject in terms of outfit, suggesting scenarios where people do not changetheir attire. In this paper we overcome this restriction, by proposing aframework based on a deep convolutional neural network, SOMAnet, thatadditionally models other discriminative aspects, namely, structural attributesof the human figure (e.g. height, obesity, gender). Our method is unique inmany respects. First, SOMAnet is based on the Inception architecture, departingfrom the usual siamese framework. This spares expensive data preparation(pairing images across cameras) and allows the understanding of what thenetwork learned. Second, and most notably, the training data consists of asynthetic 100K instance dataset, SOMAset, created by photorealistic human bodygeneration software. Synthetic data represents a good compromise betweenrealistic imagery, usually not required in re-identification since surveillancecameras capture low-resolution silhouettes, and complete control of thesamples, which is useful in order to customize the data w.r.t. the surveillancescenario at-hand, e.g. ethnicity. SOMAnet, trained on SOMAset and fine-tuned onrecent re-identification benchmarks, outperforms all competitors, matchingsubjects even with different apparel. The combination of synthetic data withInception architectures opens up new research avenues in re-identification.
展开▼